Estimating the Empirical Null Distribution of Maxmean Statistics in Gene Set Analysis
نویسندگان
چکیده
Gene Set Analysis (GSA) is a framework for testing the association of a set of genes and the outcome, e.g. disease status or treatment group. The method replies on computing a maxmean statistic and estimating the null distribution of the maxmean statistics via a restandardization procedure. In practice, the pre-determined gene sets have stronger intra-correlation than genes across sets. This may result in biases in the estimated null distribution. We derive an asymptotic null distribution of the maxmean statistics based on sparsity assumption. We propose a flexible two group mixture model for the maxmean statistics. The mixture model allows us to estimate the null parameters empirically via maximum likelihood approach. Our empirical method is compared with the restandardization procedure of GSA in simulations. We show that our method is more accurate in null density estimation when the genes are strongly correlated within gene sets.
منابع مشابه
Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq
Genome and transcriptome studies using microarray and RNA-seq technologies often involve simultaneous hypothesis testing of thousands of genes or transcripts. A key step determining significant differential expression in such large-scale testing is obtaining the null distribution of the test statistics. We show by examples that the asymptotic null is often inappropriate for many of the χ tests ...
متن کاملEmpirical phi-divergence test statistics for testing simple and composite null hypotheses
The main purpose of this paper is to introduce first a new family of empirical test statistics for testing a simple null hypothesis when the vector of parameters of interest are defined through a specific set of unbiased estimating functions. This family of test statistics is based on a distance between two probability vectors, with the first probability vector obtained by maximizing the empiri...
متن کاملA New Approximation for the Null Distribution of the Likelihood Ratio Test Statistics for k Outliers in a Normal Sample
Usually when performing a statistical test or estimation procedure, we assume the data are all observations of i.i.d. random variables, often from a normal distribution. Sometimes, however, we notice in a sample one or more observations that stand out from the crowd. These observation(s) are commonly called outlier(s). Outlier tests are more formal procedures which have been developed for detec...
متن کاملEstimating the null distribution for conditional inference and genome-scale screening
In a novel approach to the multiple testing problem, Efron (2004; 2007) formulated estimators of the distribution of test statistics or nominal p-values under a null distribution suitable for modeling the data of thousands of una ected genes, non-associated single-nucleotide polymorphisms, or other biological features. Estimators of the null distribution can improve not only the empirical Bayes...
متن کاملImpact of Outliers in Data Envelopment Analysis
This paper will examine the relationship between "Data Envelopment Analysis" and a statistical concept ``Outlier". Data envelopment analysis (DEA) is a method for estimating the relative efficiency of decision making units (DMUs) having similar tasks in a production system by multiple inputs to produce multiple outputs. An important issue in statistics is to identify the outliers. In this pap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017